14 Report 2

(Generalised) linear mixed models

The goal of this report is to review and consolidate what we learned together in the second block of the course. You are not required to do anything that we have not already seen.

For students enrolled in this course in the Winter Semester 2023/24: The report is due March 29, 2024 at 11:59pm. Please submit your Quarto script, as well as a rendered copy in HTML and PDF to Moodle (under ‘Reports’).

14.1 Dataset

For this report you will continue using the data from Biondo et al. (2022), an eye-tracking reading study on adverb-tense congruence effects on reading time measures. Participants’ eye movements were recorded as they read Spanish sentences where temporal adverbs and verb tense were either congruent or incongruent. For both sentence regions, the time reference was either past (e.g., yesterday, bought) or future (e.g., tomorrow, will buy). Example stimuli from this experiment are given in Table 14.1.

Table 14.1: Example stimuli
sentence	adverb	verb	gramm
A la salida del trabajo, ayer las chicas compraron pan en la tienda.<br> After leaving work yesterday the girls bought bread at the shop	past	past	gramm
A la salida del trabajo, ayer las chicas *\comprarán** pan en la tienda.<br> After leaving work yesterday the girls *\will buy** bread at the shop	past	future	ungramm
A la salida del trabajo, mañana las chicas comprarán pan en la tienda.<br> After leaving work tomorrow the girls will buy bread at the shop	future	future	gramm
A la salida del trabajo, mañana las chicas *\compraron** pan en la tienda.<br> After leaving work tomorrow the girls *\bought** bread at the shop	future	past	ungramm

You will be fitting models to different eye-tracking reading measures from this experiment, with the predictors adverb time and grammaticality.

14.2 Set-up

Make sure you begin with a clear working environment. To achieve this, you can go to Session > Restart R. Your Environment should have no objects in it, and you should not have any packages loaded.

14.2.1 Quarto YAML

Make sure your YAML looks something like this:

---
title: "Report 2"
name: "My Name"
format:
  html: default
  pdf: default
toc: true
number-sections: true
---

Render often

I suggest you render your document frequently, e.g., after every substantial code chunk/task achieved. This will ensure earlier detection of broken code and makes it easier to fix problems. Do this for both HTML and PDF.

14.2.2 Packages

Load the following packages, however you prefer (i.e., you don’t have to use pacman::p_load()):

tidyverse
janitor
here
broom.mixed
lattice
lme4
lmerTest

Describe what each of the following packages is used for (in our experience, they have many more useful functions than we’ve tried).

broom.mixed:
lattice:
lme4:
lmerTest:

14.2.3 Data

Load in the Biondo et al. (2022) data by running the following code chunk.

df_biondo <-
  read_csv(here("data", "Biondo.Soilemezidi.Mancini_dataset_ET.csv"),
           locale = locale(encoding = "Latin1") ## for special characters in Spanish
           ) |> 
  clean_names() |> 
  mutate(gramm = ifelse(gramm == "0", "ungramm", "gramm")) |> 
  mutate_if(is.character,as_factor) |> # all character variables as factors
  filter(adv_type == "Deic") |> 
  droplevels() |> 
  mutate(
    roi_length = str_length(label)
  ) |> 
  relocate(roi_length, .after = label)

The last few lines add a new variable (roi_length) that contains region length (in letters). We will use this as a covariate in one of our models.

14.3 Model set up

You will be asked to run two models, one linear mixed model (lmer() from the lme4 or lmerTest package) and one genearlised (logistic) linear mixed model (glmer(family = "binomial") from the lme4 package).

14.3.1 Variable transformations

For each model, consider whether you need to implement the following steps:

centre (sum contrast code) categorical predictors
standardize continuous predictors (e.g., using the scale() function)
log-transform continuous dependent variables if skewed
model selection: begin with a maximal model
- simplify in case of nonconvergence or singular fit

14.3.2 Model selection

For each model, start with a “maximal” model justified by the design. If you encounter convergence issues, begin by first implementing “unintrusive” remedies. If you still have convergence issues (as indicated by warning messages and/or e.g., inspecting the variance-covariance matrix), reduce the random effects structure as you see fit. Be sure to document and justify your decisions step-by-step. N.B., the equivalent of lmerControl argument (for lmer() models) is glmerControl for glmer() models.

If you choose to use the lme4::allFit() function, beware that it can take a long time to run, especially on ‘maximal’ models. I suggest you (i) save the output as an object (e.g., allFit_model1 <- allFit(model1)) and (ii) plan another task that doesn’t involve running code when you run this function.

I am not expecting any particular model/random effects structure that is correct, but am looking for explanations on how you made decisions regarding what to remove or keep in your model.

14.4 Linear mixed model

Fit a linear mixed model to total reading times (tt) at the adverb region (roi == 2). Your fixed effects are adverb time reference (adv_t), grammaticality (gramm), their interaction, and (standardized) region length in characters as a covariate without any interaction. Include by-participant and -item random effects.

14.4.1 Fit a model

Start by defining your most maximal model justified by your design, and simplify accordingly. Remember to not delete the code for nonconverging models, instead set the code chunk to not run when you render your document, as in the code chunk below (#| eval: false).

```{r}
#| eval: false

fit_some_maximal_model <- 
  lmer(dependent_variable ~ predictor1*predictor2 + covariate +
         (1 + predictor1*predictor2|participant) +
         data = my_data,
       subset = some_factor == "some_level")
# informative comment, e.g., "didn't converge"
```

14.4.2 Report results

Once you’ve landed on a final model that converges, inspect the fixed and random effects (some useful functions we’ve already seen: summary(), broom.mixed::tidy(), fixef(), ranef(), coef(), lattice::dotplot()).

14.5 Generalised linear mixed model

We didn’t cover how to implement logistic mixed regression, however the relationship between lm() and glm() is the same in mixed models (lmer() and glmer()).

14.5.1 Fit a model

Fit a generalised linear mixed model (glmer() from the lme4 package, lmerTest does not have this function) to the regressions in (ri) to the adverb region (roi == 2). Your fixed effects are adverb time reference (adv_t), grammaticality (gramm), and their interaction. Remember to use eval: false in your code chunk options to stop Quarto from running all your non-final models when rendering.

14.5.2 Report results

Recall that our coefficient estimates are in log odds. The interpretation of your coefficient estimates (fixed effects) is identical to that in genearlised linear models (i.e., without random effects).

14.6 Interpretation

Write a short report of the findings from the two models. Produce a table and plot like in the example above to supplement your report.

14.7 Render

Render your Quarto finished script. Upload the .qmd, .pdf, and .html files to Moodle. N.B., you need to have tinytex installed to be able to render PDFs.